Manipulating time series Data

Lucas S. Macoris

Outline

This lecture is mainly based the following textbooks:
1. Tidy Finance (Scheuch, Voigt, and Weiss 2023)
2. R for Data Science (Wickham, Mine Cetinkaya-Rundel, and Grolemund 2023)

Coding Replications

For coding replications, whenever applicable, please follow this page or hover on the specific slides with containing coding chunks.

Ensure that you have your session properly set-up according to the instructions outlined in the course webpage
Along with the slides, this lecture will also contain a replication file, in .qmd format, containing a thorough discussion for all examples that have been showcased. This file, that will be posted on eClass®, can be downloaded and replicated on your side. To do that, download the file, open it up in RStudio, and render the Quarto document using the Render button (shortcut: Ctrl+Shift+K).
At the end of this lecture, you will be prompted with a hands-on exercise to test your skills using the tools you’ve learned as you made your way through the slides. A suggested solution will be provided in the replication file.

Organizing Financial Data

In the previous lecture, you worked your way through the exercises by using the amazing dplyr functionalities on data.frames
In some cases, you had to do some workarounds with drop_na(), slice_tail() and lag() simply because you were manipulating time series data
In this lecture, you will be introduced to a particular type of class in R: xts

Definition

xts is an R package that provides an extension of the zoo class, a class with methods for totally ordered indexed observations - in special, time series

With xts, you get a lot of flexibility in handling time series observations that are of interest of financial analysts, such as:
1. Subsetting data by years/months/days
2. Calculating rolling functions (e.g, yearly averages)
3. Aggregating data at different intervals (e.g, convert daily to weekly prices)

Question: but wait, why are we departing from dplyr?

Bridging the `tidyverse` with time series

Unfortunately, there is an issue: the tidyverse is not fully designed to work with time series classes, such as xts and zoo
As a consequence, you won’t be able to use a lot of interesting functionalities that would perfectly apply for time series
1. But don’t you worry, I got you covered: the tidyquant package¹ integrates the best resources for collecting and analyzing financial data
2. It integrates several financial packages, like zoo, xts, quantmod, TTR, and PerformanceAnalytics, with the tidy data infrastructure of the tidyverse, allowing for seamless interaction between each
You can now perform complete financial analyses using the same functionalities you’ve learned so far!

The `tidyquant` package, a short-video

The `tq_mutate()` function

Definition

The tq_mutate() function adds adds new variables to an existing tibble:

tq_mutate(.data, #The object you are performing the calculations 
       selected_variables, #The columns to send to the mutation function
       mutate_fun, #The mutation function from either the xts, quantmod, or TTR package.
       col_rename #A string or character vector containing names that can be used to quickly rename columns
       )

The main advantage is the results are returned as a tibble and the function can be used with the tidyverse
It is used when you expected additional columns to be added to the resulting data frame
You can use several time series related functions from other R packages - call tq_mutate_fun_options() to see the list of available options
All in all, it is similar in spirit to mutate()

The `tq_transmute()` function

Definition

The tq_transmute() returns only newly created columns and is typically used when periodicity changes.

tq_transmute(.data, #The object you are performing the calculations 
       selected_variables, #The columns to send to the mutation function
       mutate_fun, #The mutation function from either the xts, quantmod, or TTR package.
       col_rename #A string or character vector containing names that can be used to quickly rename columns
       )

tq_transmute() works exactly like tq_mutate() except it only returns the newly created columns
This is helpful when changing periodicity where the new columns would not have the same number of rows as the original tibble
All in all, it is similar in spirit to summarize()

Working with time series objects, Exercise I

An immediate useful example of using a time series specific functionality with a tidyverse logic relates to filtering:
1. Sometimes, we may be interested in getting only a subset of the data (for example, only GOOG information)
2. Furthermore, we may be interested in subsetting only a specific time frame for our analysis
It is relatively straightforward to do it with tidyquant:
1. Use filter() to select only rows where symbol=='GOOG'
2. In the same call, filter for date>= min_date and date<=max_date

Working with time series objects, Exercise I

Code
Output

library(tidyquant)
library(tidyverse)

#Set up the list of assets
assets=c('AMZN','GOOG','META','GME')

#Filter out
assets%>%
  tq_get()%>%
  filter(symbol=='GOOG',
         date>='2020-01-01',
         date<='2024-12-31')

# A tibble: 1,258 × 8
   symbol date        open  high   low close   volume adjusted
   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
 1 GOOG   2020-01-02  67.1  68.4  67.1  68.4 28132000     68.1
 2 GOOG   2020-01-03  67.4  68.6  67.3  68.0 23728000     67.8
 3 GOOG   2020-01-06  67.5  69.8  67.5  69.7 34646000     69.5
 4 GOOG   2020-01-07  69.9  70.1  69.5  69.7 30054000     69.4
 5 GOOG   2020-01-08  69.6  70.6  69.5  70.2 30560000     70.0
 6 GOOG   2020-01-09  71.0  71.4  70.5  71.0 30018000     70.7
 7 GOOG   2020-01-10  71.4  71.7  70.9  71.5 36414000     71.2
 8 GOOG   2020-01-13  71.8  72.0  71.3  72.0 33046000     71.7
 9 GOOG   2020-01-14  72.0  72.1  71.4  71.5 31178000     71.3
10 GOOG   2020-01-15  71.5  72.1  71.5  72.0 25654000     71.7
# ℹ 1,248 more rows

Working with time series objects, Exercise II

Another example of using a time series specific functionality is working with leads and lags:
1. Sometimes, we need to shift our variables by a specific interval, like getting the previous day’s price
2. Say, for example, that you want to understand how S&P returns levels relate to NFLX returns one-week ahead
It is relatively straightforward to do it with tidyquant:
1. Download S&P 500 and NFLX data using the tq_get() function
2. Use tq_transmute() to compute the weekly returns for each security based on daily data
3. Use tq_mutate() to generate a lagged series of S&P 500 returns

Working with time series objects, Exercise II

Code
Output

#Assuming you have already loaded the tidyquant and the tidyverse packages

#Netflix Data
NFLX=tq_get('NFLX')%>%
  #Select only the necessary columns
  select(date,symbol,adjusted)%>%
  #Apply the weeklyReturn function and call the new column 'NFLX'
  tq_transmute(mutate_fun = weeklyReturn,
               col_rename = 'NFLX')

#S&P Data
SP500=tq_get('^GSPC')%>%
  #Select only the necessary columns
  select(date,symbol,adjusted)%>%
  #Apply the weeklyReturn function and call the new column 'SP500'
  tq_transmute(mutate_fun = weeklyReturn,
               col_rename = 'SP500')%>%
  #Apply the lag function for n=1 week and call the new column 'SP500'
  tq_transmute(mutate_fun = lag.xts,
            n=1,
            col_rename = 'SP500')%>%
  #Drop all rows with NA information (row 1, in this case)
  drop_na()

#Merge Data 
inner_join(NFLX,SP500)

# A tibble: 530 × 3
   date           NFLX    SP500
   <date>        <dbl>    <dbl>
 1 2015-01-09 -0.0563   0      
 2 2015-01-16  0.0244  -0.00651
 3 2015-01-23  0.297   -0.0124 
 4 2015-01-30  0.00992  0.0160 
 5 2015-02-06  0.00579 -0.0277 
 6 2015-02-13  0.0489   0.0303 
 7 2015-02-20  0.0260   0.0202 
 8 2015-02-27 -0.00688  0.00635
 9 2015-03-06 -0.0438  -0.00275
10 2015-03-13 -0.0346  -0.0158 
# ℹ 520 more rows

Working with time series objects, Exercise III

Finance practitioners are often asked to perform analysis on a rolling basis:
1. We may want to calculate a given signal on day \(t\) based on past \(x\) periods of information
2. Say, for example, that you want to calculate a simple and exponential moving average of adjusted prices from 5 days back for a given stock
It is relatively straightforward to do it with tidyquant:
1. Download stock data using the tq_get() function
2. Use tq_mutate() twice along with the SMA() and EMA() functions setting n=5

Working with time series objects, Exercise III

Code
Output

#Assuming you have already loaded the tidyquant and the tidyverse packages

#Set up the list of assets
assets=c('AMZN')

assets%>%
  tq_get()%>%
  select(date,symbol,adjusted)%>%
  group_by(symbol)%>%
  tq_mutate(adjusted, mutate_fun = SMA, n = 5)%>%
  tq_mutate(adjusted, mutate_fun = EMA, n = 5)

# A tibble: 2,551 × 5
# Groups:   symbol [1]
   symbol date       adjusted   SMA   EMA
   <chr>  <date>        <dbl> <dbl> <dbl>
 1 AMZN   2015-01-02     15.4  NA    NA  
 2 AMZN   2015-01-05     15.1  NA    NA  
 3 AMZN   2015-01-06     14.8  NA    NA  
 4 AMZN   2015-01-07     14.9  NA    NA  
 5 AMZN   2015-01-08     15.0  15.0  15.0
 6 AMZN   2015-01-09     14.8  14.9  15.0
 7 AMZN   2015-01-12     14.6  14.8  14.8
 8 AMZN   2015-01-13     14.7  14.8  14.8
 9 AMZN   2015-01-14     14.7  14.8  14.8
10 AMZN   2015-01-15     14.3  14.6  14.6
# ℹ 2,541 more rows

Working with time series objects, Exercise IV

Lastly, financial analysts often cover a collection of securities on a rolling basis
For example, a buy-side analyst will monitor stocks from a given industry so as to understand which ones are overvalued, and which ones are undervalued
Say, for example, that you want to focus on a subset of 4 stocks, and you need to compare the cumulative return up to the latest closing price
It is easy to integrate the tidyquant functions along with the group_by() function you’ve learned when working with dplyr:
1. Get the information using tq_get()
2. Group the data by symbol
3. Apply the tq_mutate and tq_transmute functions to pass time series functions to the data - in this case, the dailyReturn() and the Return.cumulative() function

Working with time series objects, Exercise IV

Code
Output

#Assuming you have already loaded the tidyquant and the tidyverse packages

#Set up the list of assets
assets=c('AMZN','GOOG','META','GME')

assets%>%
  tq_get()%>%
  select(date,symbol,adjusted)%>%
  group_by(symbol)%>%
  tq_mutate(adjusted, mutate_fun = dailyReturn,col_rename = 'daily_return')%>%
  tq_transmute(daily_return,mutate_fun = Return.cumulative)%>%
  mutate(across(where(is.numeric),percent,big.mark='.'))%>%
  setNames(c('Ticker','Cumulative Return up-to-date'))

# A tibble: 4 × 2
# Groups:   Ticker [4]
  Ticker `Cumulative Return up-to-date`
  <chr>  <chr>                         
1 AMZN   1 279%                        
2 GOOG   595%                          
3 META   755%                          
4 GME    298%

Wrapping-up on `tidyquant`

There is so much you can use from tidyquant in your journey as a quantitative financial analyst:
1. I strongly recommend looking at all the predefined functions supported by tidyquant - click here for a detailed discussion around all supported functions
2. You can also customize your own functions that work with time series (for example, your secret trading indicator) and pass it over through tq_mutate() or tq_transmute()
In the package’s official website, you can find a variety of examples to nurture your creativity around what you can do using this package
All in all, that’s the motto: time series analysis made easy with tidyquant and the tidyverse

Does getting ripped increase returns?

\(\rightarrow\) See Deadlift: The ETF World’s Latest Headscratcher

Hands-On Exercise

Your manager (who did not lift any weights past the last 5 years) wanted to replicate the returns of the Deadlift ETF from 2020 to 2024. You job is to create a simple table of yearly returns comparing the Deadlift ETF vis-a-vis the S&P 500 Index
Follow the instructions and answer to the following questions:
1. When looking at the yearly results from both the Deadlift ETF and S&P 500, which one did perform better?
2. What are the potential explanations for the result you have found?
To answer to these questions, you will be using the a combination of dplyr and tidyquant functions you have learned so far
The expected result is a data.frame object that shows both the Deadlift ETF as well as the S&P 500 returns (columns) on a yearly basis (rows)

\(\rightarrow\) Suggested solution will be provided in the replication file for this lecture.

Hands-On Exercise, continued

Exercise

Before you start, make sure to have the tidyverse and the tidyquant packages loaded in your session. Following the instructions from the previous lectures, you can either make a direct call to each package, library(tidyverse) and library(tidyquant), or copy-paste the script from the course’s official website.

Use tq_get() to load information from the S&P Index and the Deadlift ETF constituents in two separate objects. You can use the code ^GSPC to retrieve information for the index, and you can pass a vector c('ticker1','ticker2',...,'ticker_n') to get information on the Deadlift ETF constituents
Filter for observations starting between 2020 (beginning of) and 2024 (end of) using the from and to arguments of the tq_get() function
Group the Deadlift ETF data by symbol using the group_by() function
For both data sets, create a yearly_ret variable that calculates the yearly return of a given security. You can use the tq_transmute() function, passing the yearlyReturn() function along the chain
For the Deadlift data set, regroup the data by date and calculate the Deadlift returns using a mutate() function (Hint: it is an equally weighted portfolio)
Merge both datasets using inner_join()

# Set up the list of assets
deadlift=c('META','AMZN','GS','UBER','MSFT','AAPL','BLK','NVDA')

#Set up the starting date
start='2020-01-01'
end='2024-12-31'

#Step 1: Read the Deadlift data using tidyquant
Deadlift_Performance=deadlift%>%
  tq_get(from=start,to=end)%>%
  #Select only the columns of interest
  select(symbol,date,adjusted)%>%
  #Group by symbol and date
  group_by(symbol)%>%
  #Use tq_transmute to aggregate and calculate weekly returns
  tq_transmute(selected=adjusted,
               mutate_fun=yearlyReturn,
               col_rename = 'Deadlift')%>%
  #Group by date
  group_by(date)%>%
  #Summarize average return (since it is an equally-weighted portfolio)
  summarize(Deadlift=mean(Deadlift,na.rm=TRUE))

#Step 2: Read the S&P 500 data using tidyquant
SP500_Performance=tq_get('^GSPC',from=start,to=end)%>%
  #Select only the columns of interest
  select(symbol,date,adjusted)%>%
  #Group by symbol and date
  group_by(symbol)%>%
  #Use tq_transmute to aggregate and calculate weekly returns
  tq_transmute(selected=adjusted,
               mutate_fun=yearlyReturn,
               col_rename = 'SP500')%>%
  ungroup()%>%
  select(-symbol)
    
#Merge
SP500_Performance%>%
  inner_join(Deadlift_Performance)%>%
  mutate(across(where(is.numeric),percent))%>%
  mutate(date=year(date))%>%
  setNames(c('Year','S&P 500','DeadLift ETF'))

# A tibble: 5 × 3
   Year `S&P 500` `DeadLift ETF`
  <dbl> <chr>     <chr>         
1  2020 15.29%    57.9%         
2  2021 26.89%    37.2%         
3  2022 -19.44%   -36.0%        
4  2023 24.23%    100.5%        
5  2024 23.84%    52.1%

References

Scheuch, Christoph, Stefan Voigt, and Patrick Weiss. 2023. Tidy Finance with R. Chapman & Hall/CRC. https://www.tidy-finance.org/r/.

Wickham, Hadley, Mine Cetinkaya-Rundel, and Garrett Grolemund. 2023. R for Data Science. O’Reilly Media. https://r4ds.had.co.nz/.

Manipulating time series Data

Outline

Organizing Financial Data

Bridging the tidyverse with time series

The tidyquant package, a short-video

The tq_mutate() function

The tq_transmute() function

Working with time series objects, Exercise I

Working with time series objects, Exercise I

Working with time series objects, Exercise II

Working with time series objects, Exercise II

Working with time series objects, Exercise III

Working with time series objects, Exercise III

Working with time series objects, Exercise IV

Working with time series objects, Exercise IV

Wrapping-up on tidyquant

Does getting ripped increase returns?

Hands-On Exercise

Hands-On Exercise, continued

Solution

References

Bridging the `tidyverse` with time series

The `tidyquant` package, a short-video

The `tq_mutate()` function

The `tq_transmute()` function

Wrapping-up on `tidyquant`